Goto

Collaborating Authors

 Yemen


It's the End of the World (And It's Their Fault)

The Atlantic - Technology

It's late morning on a Monday in March and I am, for reasons I will explain momentarily, in a private bowling alley deep in the bowels of a 65 million mansion in Utah. Jesse Armstrong, the showrunner of HBO's hit series Succession, approaches me, monitor headphones around his neck and a wide grin on his face. "I take it you've seen the news," he says, flashing his phone and what appears to be his X feed in my direction. Everyone had: An hour earlier, my boss Jeffrey Goldberg had published a story revealing that U.S. national-security leaders had accidentally added him to a Signal group chat where they discussed their plans to conduct then-upcoming military strikes in Yemen. "Incredibly fucking depressing," Armstrong said.


Three sensitive messages from full Signal chat explained

BBC News

In his message, Waltz congratulates Pete - referring to Hegseth, as well as the IC, shorthand for "intelligence community" and Kurilla, a reference to Michael Kurilla, a US Army General who oversees Central Command, a regional combatant command with responsibility over the Middle East and parts of Central and South Asia. The messages do not reveal how the target's whereabouts or movements were tracked. A military expert contacted by the BBC - but who wished to rename nameless - suggested that a combination of aerial platforms, technological tracking capabilities or human intelligence on the ground could have been used, or a combination of various sources. At least 53 people were killed in the initial wave of US airstrikes on Houthi targets in Yemen, which struck more than 30 targets including training facilities, drone infrastructure, as well as weapons manufacturing and storage sties and command and control centres, including one in which the Pentagon said several unmanned aerial vehicle experts were located. It is unclear which of the targets Waltz was referring to in the group chat.


Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs

arXiv.org Artificial Intelligence

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.


Meta AI adds Arabic support for Middle East and North Africa

ZDNet

As large language models face growing criticism for their lack of language inclusivity beyond the English-dominated West, leading AI companies have started tailoring regional-specific LLMs to break this cycle. Now, Meta is riding that wave. Meta is expanding Meta AI across the Middle East and North Africa (aka, MENA) as it provides language support for millions of Arabic-speaking users in Algeria, Egypt, Iraq, Jordan, Libya, Morocco, Saudi Arabia, Tunisia, the United Arab Emirates (UAE), and Yemen. Similarly, Mistral AI recently released its first Arabic-centric language model, Saba, which is tailored to meet the needs of its growing customer base in Arabic-speaking countries. Meta AI, an AI-powered chatbot and virtual assistant based on Llama 3.2, is available on Facebook, Instagram, WhatsApp, and Messenger.


From Newswire to Nexus: Using text-based actor embeddings and transformer networks to forecast conflict dynamics

arXiv.org Artificial Intelligence

This study advances the field of conflict forecasting by using text-based actor embeddings with transformer models to predict dynamic changes in violent conflict patterns at the actor level. More specifically, we combine newswire texts with structured conflict event data and leverage recent advances in Natural Language Processing (NLP) techniques to forecast escalations and de-escalations among conflicting actors, such as governments, militias, separatist movements, and terrorists. This new approach accurately and promptly captures the inherently volatile patterns of violent conflicts, which existing methods have not been able to achieve. To create this framework, we began by curating and annotating a vast international newswire corpus, leveraging hand-labeled event data from the Uppsala Conflict Data Program. By using this hybrid dataset, our models can incorporate the textual context of news sources along with the precision and detail of structured event data. This combination enables us to make both dynamic and granular predictions about conflict developments. We validate our approach through rigorous back-testing against historical events, demonstrating superior out-of-sample predictive power. We find that our approach is quite effective in identifying and predicting phases of conflict escalation and de-escalation, surpassing the capabilities of traditional models. By focusing on actor interactions, our explicit goal is to provide actionable insights to policymakers, humanitarian organizations, and peacekeeping operations in order to enable targeted and effective intervention strategies.


Risk factor identification and classification of malnutrition among under-five children in Bangladesh: Machine learning and statistical approach

arXiv.org Artificial Intelligence

This study aims to understand the factors that resulted in under-five children's malnutrition from the Multiple Indicator Cluster (MICS-2019) nationwide surveys and classify different malnutrition stages based on the four well-established machine learning algorithms, namely - Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Multi-layer Perceptron (MLP) neural network. Accuracy, precision, recall, and F1 scores are obtained to evaluate the performance of each model. The statistical Pearson correlation coefficient analysis is also done to understand the significant factors related to a child's malnutrition. The eligible data sample for analysis was 21,858 among 24,686 samples from the dataset. Satisfactory and insightful results were obtained in each case and, the RF and MLP performed extraordinarily well. For RF, the accuracy was 98.55%, average precision 98.3%, recall value 95.68%, and F1 score 97.13%. For MLP, the accuracy was 98.69%, average precision 97.62%, recall 90.96%, and F1 score of 97.39%. From the Pearson co-efficient, all negative correlation results are enlisted, and the most significant impacts are found for the WAZ2 (Weight for age Z score WHO) (-0.828"), WHZ2 (Weight for height Z score WHO) (-0.706"), ZBMI (BMI Z score WHO) (-0.656"), BD3 (whether child is still being breastfed) (-0.59"), HAZ2 (Height for age Z score WHO) (-0.452"), CA1 (whether child had diarrhea in last 2 weeks) (-0.34"), Windex5 (Wealth index quantile) (-0.161"), melevel (Mother's education) (-0.132"), and CA14/CA16/CA17 (whether child had illness with fever, cough, and breathing) (-0.04) in successive order.


Artificial intelligence contribution to translation industry: looking back and forward

arXiv.org Artificial Intelligence

This study provides a comprehensive analysis of artificial intelligence (AI) contribution to translation industry (ACTI) research, synthesizing it over forty-one years from 1980-2024. 13220 articles were retrieved from three sources, namely WoS, Scopus, and Lens. We provided two types of analysis, viz., scientometric and thematic, focusing on cluster, subject categories, keywords, burstness, centrality and research centers as for the former. For the latter, we thematically review 18 articles, selected purposefully from the articles involved, centering on purpose, approach, findings, and contribution to ACTI future directions. The findings reveal that in the past AI contribution to translation industry was not rigorous, resulting in rule-based machine translation and statistical machine translation whose output was not satisfactory. However, the more AI develops, the more machine translation develops, incorporating Neural Networking Algorithms and (Deep) Language Learning Models like ChatGPT whose translation output has developed considerably. However, much rigorous research is still needed to overcome several problems encountering translation industry, specifically concerning low-source languages, multi-dialectical and free word order languages, and cultural and religious registers.


Non-native speakers of English or ChatGPT: Who thinks better?

arXiv.org Artificial Intelligence

This study sets out to answer one major question: Who thinks better, non-native speakers of English or ChatGPT?, providing evidence from processing and interpreting center-embedding English constructions that human brain surpasses ChatGPT, and that ChatGPT cannot be regarded as a theory of language. Fifteen non-native speakers of English were recruited as participants of the study. A center-embedding English sentence was presented to both the study participants and ChatGPT. The study findings unveil that human brain is still far ahead of Large Language Models, specifically ChatGPT, even in the case of non-native speakers of an L2, here English. The study concludes that human brain's ability to process and interpret natural language data is unique and that ChatGPT still lags behind this human unique ability.


What fifty-one years of Linguistics and Artificial Intelligence research tell us about their correlation: A scientometric review

arXiv.org Artificial Intelligence

There is a strong correlation between linguistics and artificial intelligence (AI), best manifested by deep learning language models. This study provides a thorough scientometric analysis of this correlation, synthesizing the intellectual production during 51 years, from 1974 to 2024. It involves 5750 Web of Science-indexed articles published in 2124 journals, which are written by 20835 authors belonging to 13773 research centers in 794 countries. Two powerful software, viz., CiteSpace and VOSviewer, were used to generate mapping visualizations of the intellectual landscape, trending issues and (re)emerging hotspots. The results indicate that in the 1980s and 1990s, linguistics and AI research was not robust, characterized by unstable publication over time. It has, however, witnessed a remarkable increase of publication since then, reaching 1478 articles in 2023, and 546 articles in January-March timespan in 2024, involving emerging issues and hotspots, addressing new horizons, new topics, and launching new applications and powerful deep learning language models including ChatGPT.


A Survey of Event Causality Identification: Principles, Taxonomy, Challenges, and Assessment

arXiv.org Artificial Intelligence

Event Causality Identification (ECI) has become a crucial task in Natural Language Processing (NLP), aimed at automatically extracting causalities from textual data. In this survey, we systematically address the foundational principles, technical frameworks, and challenges of ECI, offering a comprehensive taxonomy to categorize and clarify current research methodologies, as well as a quantitative assessment of existing models. We first establish a conceptual framework for ECI, outlining key definitions, problem formulations, and evaluation standards. Our taxonomy classifies ECI methods according to the two primary tasks of sentence-level (SECI) and document-level (DECI) event causality identification. For SECI, we examine feature pattern-based matching, deep semantic encoding, causal knowledge pre-training and prompt-based fine-tuning, and external knowledge enhancement methods. For DECI, we highlight approaches focused on event graph reasoning and prompt-based techniques to address the complexity of cross-sentence causal inference. Additionally, we analyze the strengths, limitations, and open challenges of each approach. We further conduct an extensive quantitative evaluation of various ECI methods on two benchmark datasets. Finally, we explore future research directions, highlighting promising pathways to overcome current limitations and broaden ECI applications.